{ "cells": [ { "cell_type": "markdown", "id": "200badbc-9db1-45f8-ab23-b55a36f786de", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "# Loading Chromatogram Data" ] }, { "cell_type": "raw", "id": "7f6ada18-3d66-4064-a58c-5e36735be757", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ ".. currentmodule:: massdash" ] }, { "cell_type": "raw", "id": "aef105c5-fa5e-4a15-9040-66af75bb9f32", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "A Chromatogram Loader loads raw data from chromatograms. This leads to faster data loading since no on the fly extraction is needed however this leads to less flexibility. " ] }, { "cell_type": "code", "execution_count": 1, "id": "8a402b3f-e106-49b8-af66-cd95618581dd", "metadata": { "editable": true, "nbsphinx": "hidden", "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "id": "332bb4b3-e639-43e9-aab6-55d788215940", "metadata": { "editable": true, "nbsphinx": "hidden", "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "# Please run this before executing any cell\n", "import os\n", "os.chdir(\"../../test/test_data/\") #### Insert path to data, this is the path to the tutorial data. " ] }, { "cell_type": "markdown", "id": "4956e403-6780-428a-9d5d-49ab28f5eb01", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Initiating a Chromatogram Loader" ] }, { "cell_type": "raw", "id": "88d418b1-ebd1-4e87-a8af-06d6b3620a06", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "All Chromatogram Loaders require the following inputs. " ] }, { "cell_type": "raw", "id": "dd832354-db54-4d65-b1aa-e362921f6e1c", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "\n", "1. dataFiles - a list of raw data files \n", "2. rsltsFile - a file containing the features" ] }, { "cell_type": "raw", "id": "1e3e2f4b-4e97-463c-bb4c-8ab42ea383e4", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "In the case of an :py:class:`~loaders.SqMassLoader` the data files must be a list of `.sqMass` files and the rsltsFile must be a `.osw` merged file output from pyprophet. This output is useful to visualize because it shows the exact Chromatogram which the OpenSwath peak picking uses for peak picking. " ] }, { "cell_type": "markdown", "id": "7217abb6-acb6-4f2b-875a-e34194fb8254", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "We can initiate a SqMassLoader object with multiple sqMass files as follows. " ] }, { "cell_type": "code", "execution_count": 3, "id": "bc1bc445-aabb-4e39-8d87-e63ee1fb1961", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "from massdash.loaders import SqMassLoader\n", "loader = SqMassLoader(dataFiles=[\"xics/test_chrom_1.sqMass\", \"xics/test_chrom_2.sqMass\"],\n", " rsltsFile=\"osw/test_data.osw\")" ] }, { "cell_type": "markdown", "id": "0b6ffb10-9b61-4252-b16c-dfc98d3407a0", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "
\n", "\n", "Note\n", "\n", "The provided `.osw` file must contain information for all runs.\n", "\n", "
" ] }, { "cell_type": "markdown", "id": "526eceb9-9cb3-4b87-b5be-80a0240aba92", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "Priting the loader objects shows the file paths for all of the linked files" ] }, { "cell_type": "code", "execution_count": 4, "id": "30d2b59c-d987-4ae6-b680-412b294b42da", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "SqMassLoader(rsltsFile=osw/test_data.osw, dataFiles=['xics/test_chrom_1.sqMass', 'xics/test_chrom_2.sqMass']" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "loader" ] }, { "cell_type": "markdown", "id": "99d6e296-f7c3-4092-bbd6-7e48c2cdb01a", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Loading a Transition Group" ] }, { "cell_type": "raw", "id": "373e50d3-89b8-41e7-8239-5674229184f9", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "To fetch the chromatograms for a particular transitionGroup, we can call the :py:func:`~loaders.SqMassLoader.loadTransitionGroups` method. This method requires a modified peptide sequence and a charge state and will load the transition groups across all runs. This method can take a while since it is fetching the data across all experiments from disk. " ] }, { "cell_type": "markdown", "id": "4c51615f-49dc-41b2-807b-21e3705d5df6", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "In this example we will visualize the peptide *NKESPT(UniMod:21)KAIVR(UniMod:267)* with a charge state of *3*" ] }, { "cell_type": "code", "execution_count": 16, "id": "0c439966-fe83-4b96-8010-b4e44c6feed2", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "TransitionGroupCollection\n", "test_chrom_1: -------- TransitionGroup --------\n", "precursor data: 1\n", "transition data: 6\n", "data type: Chromatogram\n", "test_chrom_2: -------- TransitionGroup --------\n", "precursor data: 1\n", "transition data: 6\n", "data type: Chromatogram" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "loader.loadTransitionGroups(\"NKESPT(UniMod:21)KAIVR(UniMod:267)\", 3)" ] }, { "cell_type": "raw", "id": "f7352626-9660-4641-897f-db0a42164500", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "Here we have a :py:class:`~structs.TransitionGroupCollection` which is a dictionary where the file keys are the runnames (each one corresponding with a different SQMass connector) and the values are a :py:class:`~structs.TransitionGroup`. The :py:class:`~structs.TransitionGroup` object holds a series of chromatograms belonging to the same precursor. " ] }, { "cell_type": "markdown", "id": "3ae9e823-8a13-4985-8887-bfdfe4ac49db", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "### Loading Chromatogram Data as a Pandas DataFrame" ] }, { "cell_type": "raw", "id": "cf18f0d5-9449-43cd-99c5-efc4b6f124ea", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "Alternatively for analysis directly in python the loaders can return a pandas dataframe object contianing all of the points for this peptide by using the :py:func:`~loaders.SqMassLoader.loadTransitionGroups` method instead." ] }, { "cell_type": "code", "execution_count": 17, "id": "0c068fd8-f2d6-4956-81c8-de40bc76afde", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
runrtintensityannotation
0test_chrom_1512.81069.0519082274_Precursor_i0
1test_chrom_1516.42230.9825972274_Precursor_i0
2test_chrom_1520.02583.0569212274_Precursor_i0
3test_chrom_1523.71876.9552762274_Precursor_i0
4test_chrom_1527.31862.1266032274_Precursor_i0
...............
2697test_chrom_21251.00.000000b4^1
2698test_chrom_21254.742.001872b4^1
2699test_chrom_21258.320.999608b4^1
2700test_chrom_21261.920.999608b4^1
2701test_chrom_21265.60.000000b4^1
\n", "

2702 rows × 4 columns

\n", "
" ], "text/plain": [ " run rt intensity annotation\n", "0 test_chrom_1 512.8 1069.051908 2274_Precursor_i0\n", "1 test_chrom_1 516.4 2230.982597 2274_Precursor_i0\n", "2 test_chrom_1 520.0 2583.056921 2274_Precursor_i0\n", "3 test_chrom_1 523.7 1876.955276 2274_Precursor_i0\n", "4 test_chrom_1 527.3 1862.126603 2274_Precursor_i0\n", "... ... ... ... ...\n", "2697 test_chrom_2 1251.0 0.000000 b4^1\n", "2698 test_chrom_2 1254.7 42.001872 b4^1\n", "2699 test_chrom_2 1258.3 20.999608 b4^1\n", "2700 test_chrom_2 1261.9 20.999608 b4^1\n", "2701 test_chrom_2 1265.6 0.000000 b4^1\n", "\n", "[2702 rows x 4 columns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "transitionGroupDf = loader.loadTransitionGroupsDf(\"NKESPT(UniMod:21)KAIVR(UniMod:267)\", 3)\n", "transitionGroupDf" ] }, { "cell_type": "markdown", "id": "0cf285b2-63a0-4a8f-b377-77b5a5001bb3", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "This dataframe has all of the intensities and retention times for all of the files across all transitions. Transitions can be diffrentiated by the *annotation* column and the *filename* column diffrentiates the file/run in which the chromatograms originate from. " ] }, { "cell_type": "markdown", "id": "92c5601a-5e39-49de-b3da-7355795128cd", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "For example to get the total intensity across the intensities we can use the pandas `groupby()` functions " ] }, { "cell_type": "code", "execution_count": 18, "id": "efb996ca-8bd1-4dcc-b2ff-2664a12fa16d", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
intensity
runannotation
test_chrom_12274_Precursor_i02.139805e+06
b4^13.000697e+04
y1^11.300780e+05
y2^12.837481e+04
y3^13.879062e+05
y4^11.295312e+05
y5^15.707377e+04
test_chrom_22274_Precursor_i05.931736e+05
b4^17.226959e+03
y1^11.137597e+04
y2^11.631975e+05
y3^14.025936e+04
y4^13.567035e+03
y5^11.758796e+04
\n", "
" ], "text/plain": [ " intensity\n", "run annotation \n", "test_chrom_1 2274_Precursor_i0 2.139805e+06\n", " b4^1 3.000697e+04\n", " y1^1 1.300780e+05\n", " y2^1 2.837481e+04\n", " y3^1 3.879062e+05\n", " y4^1 1.295312e+05\n", " y5^1 5.707377e+04\n", "test_chrom_2 2274_Precursor_i0 5.931736e+05\n", " b4^1 7.226959e+03\n", " y1^1 1.137597e+04\n", " y2^1 1.631975e+05\n", " y3^1 4.025936e+04\n", " y4^1 3.567035e+03\n", " y5^1 1.758796e+04" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "transitionGroupDf[['intensity', 'run', 'annotation']].groupby(['run', 'annotation']).sum()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.18" } }, "nbformat": 4, "nbformat_minor": 5 }